41 research outputs found

    Blind Inpainting with Object-aware Discrimination for Artificial Marker Removal

    Full text link
    Medical images often contain artificial markers added by doctors, which can negatively affect the accuracy of AI-based diagnosis. To address this issue and recover the missing visual contents, inpainting techniques are highly needed. However, existing inpainting methods require manual mask input, limiting their application scenarios. In this paper, we introduce a novel blind inpainting method that automatically completes visual contents without specifying masks for target areas in an image. Our proposed model includes a mask-free reconstruction network and an object-aware discriminator. The reconstruction network consists of two branches that predict the corrupted regions with artificial markers and simultaneously recover the missing visual contents. The object-aware discriminator relies on the powerful recognition capabilities of the dense object detector to ensure that the markers of reconstructed images cannot be detected in any local regions. As a result, the reconstructed image can be close to the clean one as much as possible. Our proposed method is evaluated on different medical image datasets, covering multiple imaging modalities such as ultrasound (US), magnetic resonance imaging (MRI), and electron microscopy (EM), demonstrating that our method is effective and robust against various unknown missing region patterns

    Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation

    Full text link
    When applying a pre-trained 2D-to-3D human pose lifting model to a target unseen dataset, large performance degradation is commonly encountered due to domain shift issues. We observe that the degradation is caused by two factors: 1) the large distribution gap over global positions of poses between the source and target datasets due to variant camera parameters and settings, and 2) the deficient diversity of local structures of poses in training. To this end, we combine \textbf{global adaptation} and \textbf{local generalization} in \textit{PoseDA}, a simple yet effective framework of unsupervised domain adaptation for 3D human pose estimation. Specifically, global adaptation aims to align global positions of poses from the source domain to the target domain with a proposed global position alignment (GPA) module. And local generalization is designed to enhance the diversity of 2D-3D pose mapping with a local pose augmentation (LPA) module. These modules bring significant performance improvement without introducing additional learnable parameters. In addition, we propose local pose augmentation (LPA) to enhance the diversity of 3D poses following an adversarial training scheme consisting of 1) a augmentation generator that generates the parameters of pre-defined pose transformations and 2) an anchor discriminator to ensure the reality and quality of the augmented data. Our approach can be applicable to almost all 2D-3D lifting models. \textit{PoseDA} achieves 61.3 mm of MPJPE on MPI-INF-3DHP under a cross-dataset evaluation setup, improving upon the previous state-of-the-art method by 10.2\%

    A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision

    Full text link
    Deep learning has the potential to revolutionize sports performance, with applications ranging from perception and comprehension to decision. This paper presents a comprehensive survey of deep learning in sports performance, focusing on three main aspects: algorithms, datasets and virtual environments, and challenges. Firstly, we discuss the hierarchical structure of deep learning algorithms in sports performance which includes perception, comprehension and decision while comparing their strengths and weaknesses. Secondly, we list widely used existing datasets in sports and highlight their characteristics and limitations. Finally, we summarize current challenges and point out future trends of deep learning in sports. Our survey provides valuable reference material for researchers interested in deep learning in sports applications

    Experimental investigation on the flexural mechanical behaviour of an immersion joint

    Get PDF
    The immersed tunnelling technique is commonly used for river or sea crossings worldwide. Seismic safety criteria of immersed tunnels involve the shear stiffness, axial stiffness, flexural stiffness, and opening deformations of the immersion joints. Therefore, it is necessary to conduct the mechanical analysis of the joint between the immersed tunnel elements. An experi-ment of an immersion joint is presented in this paper, mainly dealing with the experiment design, axial behaviour and flexural behaviour of the immersion joint. The geometric scale of this experi-ment is 1:10. The model joint in this paper includes two 3.8m x 1.15m x 1.2m segments with a rubber gasket and horizontal steel shear keys between them. Different levels of water pressure were considered due to the significant changes of water depth in real project. The displacements of an immersion joint under multi-level loads were measured and analysed considering the hyper-elastic property of a GINA gasket. It can be found that the mechanical behaviour of a GINA gasket is significantly affected by both flexure and axial loadings. Moreover, the flexural stiffness ratio of the joint with respect to that of the tunnel element in service states ranges from 1/27 to 1/272. The results are useful for the further numerical analysis of immersion joint and more related publi-cations are expected in the future

    DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models

    Full text link
    Image-based fashion design with AI techniques has attracted increasing attention in recent years. We focus on a new fashion design task, where we aim to transfer a reference appearance image onto a clothing image while preserving the structure of the clothing image. It is a challenging task since there are no reference images available for the newly designed output fashion images. Although diffusion-based image translation or neural style transfer (NST) has enabled flexible style transfer, it is often difficult to maintain the original structure of the image realistically during the reverse diffusion, especially when the referenced appearance image greatly differs from the common clothing appearance. To tackle this issue, we present a novel diffusion model-based unsupervised structure-aware transfer method to semantically generate new clothes from a given clothing image and a reference appearance image. In specific, we decouple the foreground clothing with automatically generated semantic masks by conditioned labels. And the mask is further used as guidance in the denoising process to preserve the structure information. Moreover, we use the pre-trained vision Transformer (ViT) for both appearance and structure guidance. Our experimental results show that the proposed method outperforms state-of-the-art baseline models, generating more realistic images in the fashion design task. Code and demo can be found at https://github.com/Rem105-210/DiffFashion

    Devil in the Number: Towards Robust Multi-modality Data Filter

    Full text link
    In order to appropriately filter multi-modality data sets on a web-scale, it becomes crucial to employ suitable filtering methods to boost performance and reduce training costs. For instance, LAION papers employs the CLIP score filter to select data with CLIP scores surpassing a certain threshold. On the other hand, T-MARS achieves high-quality data filtering by detecting and masking text within images and then filtering by CLIP score. Through analyzing the dataset, we observe a significant proportion of redundant information, such as numbers, present in the textual content. Our experiments on a subset of the data unveil the profound impact of these redundant elements on the CLIP scores. A logical approach would involve reevaluating the CLIP scores after eliminating these influences. Experimentally, our text-based CLIP filter outperforms the top-ranked method on the ``small scale" of DataComp (a data filtering benchmark) on ImageNet distribution shifts, achieving a 3.6% performance improvement. The results also demonstrate that our proposed text-masked filter outperforms the original CLIP score filter when selecting the top 40% of the data. The impact of numbers on CLIP and their handling provide valuable insights for improving the effectiveness of CLIP training, including language rewrite techniques.Comment: ICCV 2023 Workshop: TNGCV-DataCom

    Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation

    Full text link
    Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge of learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO achieves state-of-the-art (SOTA) performance on Human3.6M as minMPJPE 51.451.4mm without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE 42.642.6mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW

    Early warning analysis of mountain flood disaster based on Copula function risk combination

    Get PDF
    Mountain torrent disaster prevention is the focus of flood control and disaster reduction in China. Critical rainfall is an important indicator to determine the success or failure of mountain torrent disaster early warning. In this paper, the M-Copula function is introduced, the multi-dimensional joint distribution of critical rainfall is constructed, and the joint distribution of rainfall and peak rainfall intensity is analyzed. Taking A village in Xinxian County as an example. The critical rainfall of the combined probability is calculated, and the critical rainfall of the flash flood disaster water level, the pre-shift warning and the sharp-shift warning is warned and analyzed. The results show that the flood peak modulus calculated by Yishangfan group is 8.89, which has certain rules for the flood peak modulus of rivers in hilly areas. The larger the basin area is, the smaller the flood peak modulus is, the smaller the area is, and the larger the flood peak modulus is. The calculation result of the design flow of 533 m3/s is reasonable. It is reasonable and reliable to select the M-Copula function as the connection function to fit the joint distribution of rainfall and peak rainfall intensity, which can provide theoretical support for flash flood disaster warning in other regions

    PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

    Full text link
    Existing 3D human pose estimators face challenges in adapting to new datasets due to the lack of 2D-3D pose pairs in training sets. To overcome this issue, we propose \textit{Multi-Hypothesis \textbf{P}ose \textbf{Syn}thesis \textbf{D}omain \textbf{A}daptation} (\textbf{PoSynDA}) framework to bridge this data disparity gap in target domain. Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain. By incorporating a multi-hypothesis network, PoSynDA generates diverse pose hypotheses and aligns them with the target domain. To do this, it first utilizes target-specific source augmentation to obtain the target domain distribution data from the source domain by decoupling the scale and position parameters. The process is then further refined through the teacher-student paradigm and low-rank adaptation. With extensive comparison of benchmarks such as Human3.6M and MPI-INF-3DHP, PoSynDA demonstrates competitive performance, even comparable to the target-trained MixSTE model\cite{zhang2022mixste}. This work paves the way for the practical application of 3D human pose estimation in unseen domains. The code is available at https://github.com/hbing-l/PoSynDA.Comment: Accepted to ACM Multimedia 2023; 10 pages, 4 figures, 8 tables; the code is at https://github.com/hbing-l/PoSynD

    High prevalence of vitamin D deficiency among children aged 1 month to 16 years in Hangzhou, China

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent studies have suggested that vitamin D deficiency in children is widespread. But the vitamin D status of Chinese children is seldom investigated. The objective of the present study was to survey the serum levels of 25-hydroxyvitamin D [25(OH)D] in more than 6,000 children aged 1 month to 16 years in Hangzhou (latitude: 30°N), the capital of Zhejiang Province, southeast China.</p> <p>Methods</p> <p>The children aged 1 month to 16 years who came to the child health care department of our hospital, the children's hospital affiliated to Zhejiang university school of medicine, for health examination were taken blood for 25(OH) D measurement. Serum 25(OH) D levels were determined by direct enzyme-linked immunosorbent assay and categorized as < 25, < 50, and < 75 nmol/L.</p> <p>Results</p> <p>A total of 6,008 children aged 1 month to 16 years participated in this cross-sectional study. All the subjects were divided into subgroups according to their age: 0-1y, 2-5y, 6-11y and 12-16y representing infancy, preschool, school age and adolescence stages respectively. The highest mean level of serum 25(OH)D was found in the 0-1y stage (99 nmol/L) and the lowest one was found in 12-16y stage (52 nmol/L). Accordingly, the prevalence of serum 25(OH)D levels of < 75 nmol/L and < 50 nmol/L were at the lowest among infants (33.6% and 5.4% respectively) and rose to the highest among adolescents (89.6% and 46.4% respectively). The mean levels of serum 25(OH)D and the prevalence of vitamin D deficiency changed according to seasons. In winter and spring, more than 50% of school age children and adolescents had a 25(OH)D level at < 50 nmol/L. If the threshold is changed to < 75 nmol/L, all of the adolescents (100%) had low 25(OH)D levels in winter and 93.7% school age children as well.</p> <p>Conclusions</p> <p>The prevalence of vitamin D deficiency and insufficiency among children in Hangzhou Zhejiang province is high, especially among children aged 6-16 years. We suggest that the recommendation for vitamin D supplementation in Chinese children should be extended to adolescence.</p
    corecore